NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

HONeYBEE: enabling scalable multimodal AI in oncology through foundation model-driven embeddings

https://doi.org/10.1038/s41746-025-02003-4

Tripathi, Aakash; Waqas, Asim; Schabath, Matthew_B; Yilmaz, Yasin; Rasool, Ghulam (October 2025, npj Digital Medicine)
Using consensus-based reasoning and large language models to extract structured data from surgical pathology reports

https://doi.org/10.1016/j.labinv.2025.104272

Tripathi, Aakash; Waqas, Asim; Venkatesan, Kavya; Ullah, Ehsan; Khan, Asma; Khalil, Farah; Chen, Wei-Shen; Ozturk, Zarifa Gahramanli; Saeed-Vafa, Daryoush; Bui, Marilyn M; et al (December 2025, Laboratory Investigation)

Free, publicly-accessible full text available December 1, 2026
Self-Normalizing Multi-Omics Neural Network for Pan-Cancer Prognostication

https://doi.org/10.3390/ijms26157358

Waqas, Asim; Tripathi, Aakash; Ahmed, Sabeen; Mukund, Ashwin; Farooq, Hamza; Johnson, Joseph O; Stewart, Paul A; Naeini, Mia; Schabath, Matthew B; Rasool, Ghulam (August 2025, International Journal of Molecular Sciences)

Free, publicly-accessible full text available August 1, 2026
Multimodal data integration for oncology in the era of deep neural networks: a review

https://doi.org/10.3389/frai.2024.1408843

Waqas, Asim; Tripathi, Aakash; Ramachandran, Ravi P; Stewart, Paul A; Rasool, Ghulam (July 2024, Frontiers in Artificial Intelligence)

Cancer research encompasses data across various scales, modalities, and resolutions, from screening and diagnostic imaging to digitized histopathology slides to various types of molecular data and clinical records. The integration of these diverse data types for personalized cancer care and predictive modeling holds the promise of enhancing the accuracy and reliability of cancer screening, diagnosis, and treatment. Traditional analytical methods, which often focus on isolated or unimodal information, fall short of capturing the complex and heterogeneous nature of cancer data. The advent of deep neural networks has spurred the development of sophisticated multimodal data fusion techniques capable of extracting and synthesizing information from disparate sources. Among these, Graph Neural Networks (GNNs) and Transformers have emerged as powerful tools for multimodal learning, demonstrating significant success. This review presents the foundational principles of multimodal learning including oncology data modalities, taxonomy of multimodal learning, and fusion strategies. We delve into the recent advancements in GNNs and Transformers for the fusion of multimodal data in oncology, spotlighting key studies and their pivotal findings. We discuss the unique challenges of multimodal learning, such as data heterogeneity and integration complexities, alongside the opportunities it presents for a more nuanced and comprehensive understanding of cancer. Finally, we present some of the latest comprehensive multimodal pan-cancer data sources. By surveying the landscape of multimodal data integration in oncology, our goal is to underline the transformative potential of multimodal GNNs and Transformers. Through technological advancements and the methodological innovations presented in this review, we aim to chart a course for future research in this promising field. This review may be the first that highlights the current state of multimodal modeling applications in cancer using GNNs and transformers, presents comprehensive multimodal oncology data sources, and sets the stage for multimodal evolution, encouraging further exploration and development in personalized cancer care.
more » « less
Full Text Available
Building Flexible, Scalable, and Machine Learning-Ready Multimodal Oncology Datasets

https://doi.org/10.3390/s24051634

Tripathi, Aakash; Waqas, Asim; Venkatesan, Kavya; Yilmaz, Yasin; Rasool, Ghulam (March 2024, Sensors)

The advancements in data acquisition, storage, and processing techniques have resulted in the rapid growth of heterogeneous medical data. Integrating radiological scans, histopathology images, and molecular information with clinical data is essential for developing a holistic understanding of the disease and optimizing treatment. The need for integrating data from multiple sources is further pronounced in complex diseases such as cancer for enabling precision medicine and personalized treatments. This work proposes Multimodal Integration of Oncology Data System (MINDS)—a flexible, scalable, and cost-effective metadata framework for efficiently fusing disparate data from public sources such as the Cancer Research Data Commons (CRDC) into an interconnected, patient-centric framework. MINDS consolidates over 41,000 cases from across repositories while achieving a high compression ratio relative to the 3.78 PB source data size. It offers sub-5-s query response times for interactive exploration. MINDS offers an interface for exploring relationships across data types and building cohorts for developing large-scale multimodal machine learning models. By harmonizing multimodal data, MINDS aims to potentially empower researchers with greater analytical ability to uncover diagnostic and prognostic insights and enable evidence-based personalized care. MINDS tracks granular end-to-end data provenance, ensuring reproducibility and transparency. The cloud-native architecture of MINDS can handle exponential data growth in a secure, cost-optimized manner while ensuring substantial storage optimization, replication avoidance, and dynamic access capabilities. Auto-scaling, access controls, and other mechanisms guarantee pipelines’ scalability and security. MINDS overcomes the limitations of existing biomedical data silos via an interoperable metadata-driven approach that represents a pivotal step toward the future of oncology data integration.
more » « less
Full Text Available
Transformers in Time-Series Analysis: A Tutorial

https://doi.org/10.1007/s00034-023-02454-8

Ahmed, Sabeen; Nielsen, Ian E; Tripathi, Aakash; Siddiqui, Shamoon; Ramachandran, Ravi P; Rasool, Ghulam (December 2023, Circuits, Systems, and Signal Processing)

Full Text Available
Robust Multimodal Fusion for Survival Prediction in Cancer Patients

https://doi.org/10.1177/11769351251376192

Flack, Dominic; Tripathi, Aakash; Waqas, Asim; Rasool, Ghulam; Dera, Dimah (January 2025, Cancer Informatics)

Full Text Available

Search for: All records